There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. UEs typically are generated via a bilevel optimization framework with a surrogate model to remove (minimize) errors from the original samples, and then applied to protect the data against unknown target models. However, existing UE generation methods all rely on an ideal assumption called label-consistency, where the hackers and protectors are assumed to hold the same label for a given sample. In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. E.g., a m-class unlearnable dataset held by the protector may be exploited by the hacker as a n-class dataset. Existing UE generation methods are rendered ineffective in this challenging setting. To tackle this challenge, we present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations. Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains. We empirically verify the effectiveness of our proposed approach under a variety of settings with different datasets, target models, and even commercial platforms Microsoft Azure and Baidu PaddlePaddle.
translated by 谷歌翻译
The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training Autoencoders as Cross-Modal Teachers (ACT). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN. Codes will be released at https://github.com/RunpeiDong/ACT.
translated by 谷歌翻译
Deep learning (DL) methods have been widely applied to anomaly-based network intrusion detection system (NIDS) to detect malicious traffic. To expand the usage scenarios of DL-based methods, the federated learning (FL) framework allows multiple users to train a global model on the basis of respecting individual data privacy. However, it has not yet been systematically evaluated how robust FL-based NIDSs are against existing privacy attacks under existing defenses. To address this issue, we propose two privacy evaluation metrics designed for FL-based NIDSs, including (1) privacy score that evaluates the similarity between the original and recovered traffic features using reconstruction attacks, and (2) evasion rate against NIDSs using Generative Adversarial Network-based adversarial attack with the reconstructed benign traffic. We conduct experiments to show that existing defenses provide little protection that the corresponding adversarial traffic can even evade the SOTA NIDS Kitsune. To defend against such attacks and build a more robust FL-based NIDS, we further propose FedDef, a novel optimization-based input perturbation defense strategy with theoretical guarantee. It achieves both high utility by minimizing the gradient distance and strong privacy protection by maximizing the input distance. We experimentally evaluate four existing defenses on four datasets and show that our defense outperforms all the baselines in terms of privacy protection with up to 7 times higher privacy score, while maintaining model accuracy loss within 3% under optimal parameter combination.
translated by 谷歌翻译
在本文中,我们从经验上研究了如何充分利用低分辨率框架以进行有效的视频识别。现有方法主要集中于开发紧凑的网络或减轻视频输入的时间冗余以提高效率,而压缩框架分辨率很少被认为是有希望的解决方案。一个主要问题是低分辨率帧的识别准确性不佳。因此,我们首先分析低分辨率帧上性能降解的根本原因。我们的主要发现是,降级的主要原因不是在下采样过程中的信息丢失,而是网络体系结构和输入量表之间的不匹配。通过知识蒸馏(KD)的成功,我们建议通过跨分辨率KD(RESKD)弥合网络和输入大小之间的差距。我们的工作表明,RESKD是一种简单但有效的方法,可以提高低分辨率帧的识别精度。没有铃铛和哨子,RESKD在四个大规模基准数据集(即ActivityNet,FCVID,Mini-Kinetics,sopeings soseings ossings v2)上,就效率和准确性上的所有竞争方法都大大超过了所有竞争方法。此外,我们广泛地展示了其对最先进的体系结构(即3D-CNN和视频变压器)的有效性,以及对超低分辨率帧的可扩展性。结果表明,RESKD可以作为最先进视频识别的一般推理加速方法。我们的代码将在https://github.com/cvmi-lab/reskd上找到。
translated by 谷歌翻译
图形到文本(G2T)生成和文本对图(T2G)三重提取是构造和应用知识图的两个必不可少的任务。事实证明,现有的无监督方法是合适的候选者,用于共同学习这两个任务,因为它们避免使用图形文本并行数据。但是,它们由多个模块组成,仍然需要实体信息和培训过程中的关系类型。为此,我们提出了Infinity,这是一种简单而有效的无监督方法,不需要外部注释工具或其他并行信息。它首次实现了完全无监督的图形相互转换。具体而言,Infinity仅通过微调一个预处理的SEQ2SEQ模型来将G2T和T2G视为双向序列生成任务。然后,设计出一种新型的基于反向翻译的框架,以自动生成连续的合成并行数据。为了获得来自源文本的结构信息的合理图表序列,通过利用奖励增强最大似然的优势,Infinity通过基于奖励的培训损失。作为一个完全无监督的框架,无限元经过经验验证,以优于G2T和T2G任务的最先进基线。
translated by 谷歌翻译
由于非平稳性,现实世界多变量时间序列(MTS)的分布会随着时间而变化,称为分布漂移。大多数现有的MT预测模型都会极大地遭受分销漂移的影响,并随着时间的推移降低了预测性能。现有方法通过适应最新到达数据或根据未来数据得出的元知识进行自我纠正来解决分布漂移。尽管在MT的预测中取得了巨大的成功,但这些方法几乎无法捕获固有的分布变化,尤其是从分布的角度来看。因此,我们提出了一个新型的框架时间条件变化自动编码器(TCVAE),以对MTS中历史观察结果和未来数据之间的动态分布依赖性进行建模,并将依赖性作为时间条件分布推断为利用潜在变量。具体而言,新型的颞鹰注意机制代表了随后馈入馈送前网络的时间因素,以估计潜在变量的先前高斯分布。时间因素的表示进一步动态地调整了基于变压器的编码器和解码器的结构,以利用门控注意机制来变化。此外,我们引入条件连续归一化流量,以将先前的高斯转化为复杂且无形式的分布,以促进对时间条件分布的灵活推断。在六个现实世界MTS数据集上进行的广泛实验表明,与最先进的MTS预测基线相比,TCVAE的出色鲁棒性和有效性。我们进一步说明了TCVAE通过多方面的案例研究和现实情况下的可视化来说明TCVAE的适用性。
translated by 谷歌翻译
已经观察到,未经授权使用面部识别系统会引发隐私问题。使用对抗扰动提供了一种解决此问题的可能解决方案。利用对抗未经授权的面部识别系统的对抗性扰动的一个关键问题是:上传到网络上的图像需要通过JPEG压缩处理,这削弱了对抗性扰动的有效性。现有的JPEG压缩方法无法在压缩性,转移性和攻击效果之间达到平衡。为此,我们提出了一种更自然的解决方案,称为低频对抗扰动(LFAP)。我们不必限制对抗性扰动,而是将源模型正规化,以通过对抗训练采用更多的低频功能。此外,为了更好地影响不同的频率组件中的模型,我们提出了以中等频率成分为生产补充的精制低中间频率对抗扰动(LMFAP)。我们在本研究中设计了各种设置,以模拟现实世界的应用程序方案,包括交叉骨架,监管头,培训数据集和测试数据集。定量和定性实验结果验证了拟议溶液的有效性。
translated by 谷歌翻译
尽管视觉语言预训练模型(VLP)显示了各种视觉语言(V+L)任务的革命性改进,但有关其对抗性鲁棒性的研究仍未得到探索。本文研究了对流行VLP模型和V+L任务的对抗性攻击。首先,我们分析了不同设置下对抗性攻击的性能。通过检查不同扰动对象和攻击目标的影响,我们得出了一些关键观察,作为设计强大的多模式对抗攻击和构建强大的VLP模型的指导。其次,我们对称为协作多模式对抗攻击(共攻击)的VLP模型提出了一种新颖的多模式攻击方法,该模型集体对图像模式和文本模式进行了攻击。实验结果表明,所提出的方法可以改善对不同V+L下游任务和VLP模型的攻击性能。分析观察和新颖的攻击方法有望为VLP模型的对抗性鲁棒性提供新的理解,从而在更真实的情况下为他们的安全和可靠的部署做出贡献。
translated by 谷歌翻译
神经辐射场(NERF)在建模3D场景和合成新型视图图像方面取得了巨大成功。但是,大多数以前的NERF方法需要大量时间来优化一个场景。显式数据结构,例如体素特征,显示出加速训练过程的巨大潜力。但是,体素特征面临两个大挑战,要应用于动态场景,即建模时间信息并捕获不同的点运动尺度。我们通过用时间感知的体素特征(称为Tineuvox)表示场景来提出一个辐射现场框架。引入了一个微小的坐标变形网络,以模拟粗糙运动轨迹,并在辐射网络中进一步增强了时间信息。提出了一种多距离插值方法,并应用于体素特征,以模拟小运动和大型运动。我们的框架大大加快了动态光芒度场的优化,同时保持高渲染质量。经验评估均在合成场景和真实场景上进行。我们的Tineuvox仅需8分钟和8 MB的存储成本即可完成培训,同时表现出比以前的动态NERF方法相似甚至更好的渲染性能。
translated by 谷歌翻译
最近的研究表明,在介绍问题中建模长期相互作用的重要性。为了实现这一目标,现有方法利用独立的注意技术或变压器,但考虑到计算成本,通常在低分辨率下。在本文中,我们提出了一个基于变压器的新型模型,用于大孔介入,该模型统一了变压器和卷积的优点,以有效地处理高分辨率图像。我们仔细设计框架的每个组件,以确保恢复图像的高保真度和多样性。具体而言,我们自定义了一个面向内部的变压器块,其中注意模块仅从部分有效令牌中汇总非本地信息,该信息由动态掩码表示。广泛的实验证明了在多个基准数据集上新模型的最新性能。代码在https://github.com/fenglinglwb/mat上发布。
translated by 谷歌翻译